Vendor Performance Analysis
Importing necessary libraries¶
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook"
import warnings
import sqlite3
from scipy.stats import ttest_ind
import scipy.stats as stats
warnings.filterwarnings('ignore')
Loading Dataset¶
In [2]:
# Creating connection
conn = sqlite3.connect('inventory.db')
# fetching vendor summary data
df = pd.read_sql_query("select * from vendor_sales_summary", conn)
df.head()
Out[2]:
| VendorNumber | VendorName | Brand | Description | PurchasePrice | ActualPrice | Volume | TotalPurchaseQuantity | TotalPurchaseDollars | TotalSalesQuantity | TotalSalesDollars | TotalSalesPrice | TotalExciseTax | FreightCost | GrossProfit | ProfitMargin | StockTurnover | SalesPurchaseRatio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | IRA GOLDMAN AND WILLIAMS, LLP | 90085 | Ch Lilian 09 Ladouys St Este | 23.86 | 36.99 | 750.0 | 8 | 190.88 | 18.0 | 665.82 | 295.92 | 2.00 | 27.08 | 474.94 | 71.331591 | 2.250000 | 3.488160 |
| 1 | 2 | IRA GOLDMAN AND WILLIAMS, LLP | 90609 | Flavor Essence Variety 5 Pak | 17.00 | 24.99 | 162.5 | 320 | 5440.00 | 24.0 | 599.76 | 449.82 | 0.52 | 27.08 | -4840.24 | -807.029478 | 0.075000 | 0.110250 |
| 2 | 54 | AAPER ALCOHOL & CHEMICAL CO | 990 | Ethyl Alcohol 200 Proof | 105.07 | 134.49 | 3750.0 | 1 | 105.07 | 0.0 | 0.00 | 0.00 | 0.00 | 0.48 | -105.07 | -inf | 0.000000 | 0.000000 |
| 3 | 60 | ADAMBA IMPORTS INTL INC | 771 | Bak's Krupnik Honey Liqueur | 11.44 | 14.99 | 750.0 | 39 | 446.16 | 47.0 | 704.53 | 494.67 | 37.01 | 367.52 | 258.37 | 36.672675 | 1.205128 | 1.579097 |
| 4 | 60 | ADAMBA IMPORTS INTL INC | 3401 | Vesica Vodka | 11.10 | 14.99 | 1750.0 | 6 | 66.60 | 0.0 | 0.00 | 0.00 | 0.00 | 367.52 | -66.60 | -inf | 0.000000 | 0.000000 |
Exploratory Data Analysis¶
Previously we examined the various tables in the database to identify key variables, understand their relationships and determine which one should be included in the final analysis.
In this phase of EDA we'll analyse the resultant table to gain insights into the distribution of each column. This will help us understand data patterns, identify anomalies and ensure data quality defore proceeding with further analysis.
In [3]:
# Summary statistics
df.describe().T
Out[3]:
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| VendorNumber | 10692.0 | 1.065065e+04 | 18753.519148 | 2.00 | 3951.000000 | 7153.000000 | 9552.000000 | 2.013590e+05 |
| Brand | 10692.0 | 1.803923e+04 | 12662.187074 | 58.00 | 5793.500000 | 18761.500000 | 25514.250000 | 9.063100e+04 |
| PurchasePrice | 10692.0 | 2.438530e+01 | 109.269375 | 0.36 | 6.840000 | 10.455000 | 19.482500 | 5.681810e+03 |
| ActualPrice | 10692.0 | 3.564367e+01 | 148.246016 | 0.49 | 10.990000 | 15.990000 | 28.990000 | 7.499990e+03 |
| Volume | 10692.0 | 8.473605e+02 | 664.309212 | 50.00 | 750.000000 | 750.000000 | 750.000000 | 2.000000e+04 |
| TotalPurchaseQuantity | 10692.0 | 3.140887e+03 | 11095.086769 | 1.00 | 36.000000 | 262.000000 | 1975.750000 | 3.376600e+05 |
| TotalPurchaseDollars | 10692.0 | 3.010669e+04 | 123067.799627 | 0.71 | 453.457500 | 3655.465000 | 20738.245000 | 3.811252e+06 |
| TotalSalesQuantity | 10692.0 | 3.077482e+03 | 10952.851391 | 0.00 | 33.000000 | 261.000000 | 1929.250000 | 3.349390e+05 |
| TotalSalesDollars | 10692.0 | 4.223907e+04 | 167655.265984 | 0.00 | 729.220000 | 5298.045000 | 28396.915000 | 5.101920e+06 |
| TotalSalesPrice | 10692.0 | 1.879378e+04 | 44952.773386 | 0.00 | 289.710000 | 2857.800000 | 16059.562500 | 6.728193e+05 |
| TotalExciseTax | 10692.0 | 1.774226e+03 | 10975.582240 | 0.00 | 4.800000 | 46.570000 | 418.650000 | 3.682428e+05 |
| FreightCost | 10692.0 | 6.143376e+04 | 60938.458032 | 0.09 | 14069.870000 | 50293.620000 | 79528.990000 | 2.570321e+05 |
| GrossProfit | 10692.0 | 1.213238e+04 | 46224.337964 | -52002.78 | 52.920000 | 1399.640000 | 8660.200000 | 1.290668e+06 |
| ProfitMargin | 10692.0 | -inf | NaN | -inf | 13.324515 | 30.405457 | 39.956135 | 9.971666e+01 |
| StockTurnover | 10692.0 | 1.706793e+00 | 6.020460 | 0.00 | 0.807229 | 0.981529 | 1.039342 | 2.745000e+02 |
| SalesPurchaseRatio | 10692.0 | 2.504390e+00 | 8.459067 | 0.00 | 1.153729 | 1.436894 | 1.665449 | 3.529286e+02 |
In [4]:
# Select numeric columns only
numeric_cols = df.select_dtypes(include='number').columns.tolist()
# Create 4x4 subplots
fig = make_subplots(
rows=4, cols=4,
subplot_titles=numeric_cols)
# Add histogram for each numeric column
for i, col in enumerate(numeric_cols):
row = i // 4 + 1
col_pos = i % 4 + 1
fig.add_trace(
go.Histogram(x=df[col], nbinsx=50, name=col),
row=row, col=col_pos)
# Update layout
fig.update_layout(
height=1200, width=1200,
title_text="4x4 Numeric Plot Matrix",
showlegend=False)
fig.show()
In [5]:
# Outliers detection with Boxplot
# Create 4x4 subplots
fig = make_subplots(
rows=4, cols=4,
subplot_titles=numeric_cols)
# Add box plot for each numeric column
for i, col in enumerate(numeric_cols):
row = i // 4 + 1
col_pos = i % 4 + 1
fig.add_trace(
go.Box(y=df[col], name=col, boxpoints='outliers'),
row=row, col=col_pos)
# Update layout
fig.update_layout(
height=1200, width=1200,
title_text="4x4 Box Plot Matrix",
showlegend=False)
fig.show()
Summary Statistics Insights:¶
Negative and Zero Values:¶
- Gross Profit : Minimumvalue is -52,002.78 , indicating losses. Some products or transections may be sellling at a loss due to high costs or selling at discounts lower than the purchase price.
- Profit Margin : It has a minimun of
-infinity, which suggests cases where revenue isZeroor even lower than costs. - Total Sales Quantuty & Dollars : Minimum values are
Zero, meaning some products were purchased but never sold. These could be clow moving or obsolete stock.
Outliers indicated by high Standard Deviations :¶
Purchase & Actual Price: Themaxvalues (5,681.81 & 7,499.99) are significantly higher than themean(24.39 & 35.64), indicating potential premium product.Freight Cost: Huge Variation, from 0.09 to 2,57,032.07, suggests logistics inefficiencies or bulk shipments.Stock Turnover: Ranges from 0 to 274.5, implying some product were sold extremely fast while others remain in stock indefinitely. Value more than 1 sold quantity for that product is higher than purchased quantity due to either sales are being fulfilled from older stock.
In [6]:
# filtering the data by removing inconsistencies
df = pd.read_sql_query("""
SELECT *
FROM vendor_sales_summary
WHERE GrossProfit > 0
AND ProfitMargin > 0
AND TotalSalesQuantity > 0""", conn)
df.head()
Out[6]:
| VendorNumber | VendorName | Brand | Description | PurchasePrice | ActualPrice | Volume | TotalPurchaseQuantity | TotalPurchaseDollars | TotalSalesQuantity | TotalSalesDollars | TotalSalesPrice | TotalExciseTax | FreightCost | GrossProfit | ProfitMargin | StockTurnover | SalesPurchaseRatio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | IRA GOLDMAN AND WILLIAMS, LLP | 90085 | Ch Lilian 09 Ladouys St Este | 23.86 | 36.99 | 750.0 | 8 | 190.88 | 18.0 | 665.82 | 295.92 | 2.00 | 27.08 | 474.94 | 71.331591 | 2.250000 | 3.488160 |
| 1 | 60 | ADAMBA IMPORTS INTL INC | 771 | Bak's Krupnik Honey Liqueur | 11.44 | 14.99 | 750.0 | 39 | 446.16 | 47.0 | 704.53 | 494.67 | 37.01 | 367.52 | 258.37 | 36.672675 | 1.205128 | 1.579097 |
| 2 | 105 | ALTAMAR BRANDS LLC | 2529 | Right Gin | 23.25 | 29.99 | 750.0 | 12 | 279.00 | 12.0 | 359.88 | 59.98 | 9.44 | 62.39 | 80.88 | 22.474158 | 1.000000 | 1.289892 |
| 3 | 105 | ALTAMAR BRANDS LLC | 8412 | Tequila Ocho Plata Fresno | 35.71 | 49.99 | 750.0 | 320 | 11427.20 | 307.0 | 15346.93 | 12947.41 | 242.15 | 62.39 | 3919.73 | 25.540808 | 0.959375 | 1.343018 |
| 4 | 200 | AMERICAN SPIRITS EXCHANGE | 20789 | Zin-phomaniac Znfdl | 9.73 | 14.99 | 750.0 | 96 | 934.08 | 84.0 | 1511.16 | 287.84 | 9.43 | 6.19 | 577.08 | 38.187882 | 0.875000 | 1.617806 |
In [7]:
# Select numeric columns only
numeric_cols = df.select_dtypes(include='number').columns.tolist()
# Create 4x4 subplots
fig = make_subplots(
rows=4, cols=4,
subplot_titles=numeric_cols)
# Add histogram for each numeric column
for i, col in enumerate(numeric_cols):
row = i // 4 + 1
col_pos = i % 4 + 1
fig.add_trace(
go.Histogram(x=df[col], nbinsx=50, name=col),
row=row, col=col_pos)
# Update layout
fig.update_layout(
height=1200, width=1200,
title_text="4x4 Numeric Plot Matrix",
showlegend=False)
fig.show()
In [8]:
df.columns
Out[8]:
Index(['VendorNumber', 'VendorName', 'Brand', 'Description', 'PurchasePrice',
'ActualPrice', 'Volume', 'TotalPurchaseQuantity',
'TotalPurchaseDollars', 'TotalSalesQuantity', 'TotalSalesDollars',
'TotalSalesPrice', 'TotalExciseTax', 'FreightCost', 'GrossProfit',
'ProfitMargin', 'StockTurnover', 'SalesPurchaseRatio'],
dtype='object')
In [9]:
# Plotting categorical columns
category_cols= ['VendorName', 'Description']
# Create subplots
fig = make_subplots(
rows=1, cols=2,
subplot_titles=category_cols)
vendor_counts= df['VendorName'].value_counts().head(10)
desc_counts= df['VendorName'].value_counts().head(10)
# Add VendorName bar chart
fig.add_trace(
go.Bar(x=vendor_counts.index, y=vendor_counts.values, name="VendorName"),
row=1, col=1
)
# Add Description bar chart
fig.add_trace(
go.Bar(x=desc_counts.index, y=desc_counts.values, name="Description"),
row=1, col=2
)
# bar chart for top 10 VendorName & Description
fig.update_layout(height=800, width=800)
In [10]:
# Correlation Heatmap
corr_matrix = df[numeric_cols].corr()
# plot heatmap
fig= px.imshow(
corr_matrix,
text_auto=True,
color_continuous_scale="RdBu_r",
title="Correlation Heatmap of Numeric Features"
)
fig.update_layout(height=800, width=800)
fig.show()
Correlation Insights¶
- Purchase Price has weak correlation with ToatalSalesDollars & GrossProfit. That indicates price variation do not significantly impact sales revenue or profit.
- Strong correlation between total purchase quantity and total sales quantity(.999), which confirms efficient inventory turnover.
- The
Data Analysis¶
- Identify Brands that needs promotional or pricing adjustments which exhibits lower sales performance but higher profit mergin.
In [11]:
brand_performance= df.groupby('Description').agg({
'TotalSalesDollars': 'sum',
'ProfitMargin' : 'mean'
}).reset_index()
In [12]:
low_sales_threshold= brand_performance['TotalSalesDollars'].quantile(0.15)
high_profit_threshold= brand_performance['ProfitMargin'].quantile(0.85)
In [13]:
# Filtering brands with low sales but high profit margin
target_brands = brand_performance[
(brand_performance['TotalSalesDollars'] <= low_sales_threshold) &
(brand_performance['ProfitMargin'] >= high_profit_threshold)
]
print('Brands with Low sales but High profit Margins:')
display(target_brands.sort_values('TotalSalesDollars'))
Brands with Low sales but High profit Margins:
| Description | TotalSalesDollars | ProfitMargin | |
|---|---|---|---|
| 6199 | Santa Rita Organic Svgn Bl | 9.99 | 66.466466 |
| 2369 | Debauchery Pnt Nr | 11.58 | 65.975820 |
| 2070 | Concannon Glen Ellen Wh Zin | 15.95 | 83.448276 |
| 2188 | Crown Royal Apple | 27.86 | 89.806174 |
| 6237 | Sauza Sprklg Wild Berry Marg | 27.96 | 82.153076 |
| ... | ... | ... | ... |
| 5074 | Nanbu Bijin Southern Beauty | 535.68 | 76.747312 |
| 2271 | Dad's Hat Rye Whiskey | 538.89 | 81.851584 |
| 57 | A Bichot Clos Marechaudes | 539.94 | 67.740860 |
| 6245 | Sbragia Home Ranch Merlot | 549.75 | 66.444748 |
| 3326 | Goulee Cos d'Estournel 10 | 558.87 | 69.434752 |
198 rows × 3 columns
In [14]:
fig = px.scatter(
target_brands,
x="TotalSalesDollars",
y="ProfitMargin",
size="TotalSalesDollars",
color="ProfitMargin",
hover_data=["Description"], # show labels directly
title="Target Brands: Profit Margin vs Total Sales"
)
# Add threshold lines
fig.add_vline(x=low_sales_threshold, line_dash="dash", line_color="red",
annotation_text="Sales Threshold", annotation_position="top left")
fig.add_hline(y=high_profit_threshold, line_dash="dash", line_color="blue",
annotation_text="Profit Threshold", annotation_position="bottom right")
fig.update_layout(height= 600, width= 800)
fig.show()
Which vendors and brands demonstrates the highest sales performance
In [15]:
def format_dollar(value):
if value >= 1000000:
return f'{value / 1000000:.2f}M'
elif value >= 1000:
return f'{value/1000:.2f}K'
else:
return str(value)
In [16]:
# Top vendors and brands by Performance
top_vendors= df.groupby('VendorName')['TotalSalesDollars'].sum().nlargest(10)
top_brands= df.groupby('Description')['TotalSalesDollars'].sum().nlargest(10)
top_vendors.apply(lambda x : format_dollar(x))
Out[16]:
VendorName DIAGEO NORTH AMERICA INC 67.99M MARTIGNETTI COMPANIES 39.33M PERNOD RICARD USA 32.06M JIM BEAM BRANDS COMPANY 31.42M BACARDI USA INC 24.85M CONSTELLATION BRANDS INC 24.22M E & J GALLO WINERY 18.40M BROWN-FORMAN CORP 18.25M ULTRA BEVERAGE COMPANY LLP 16.50M M S WALKER INC 14.71M Name: TotalSalesDollars, dtype: object
In [17]:
top_brands.apply(lambda x : format_dollar(x))
Out[17]:
Description Jack Daniels No 7 Black 7.96M Tito's Handmade Vodka 7.40M Grey Goose Vodka 7.21M Capt Morgan Spiced Rum 6.36M Absolut 80 Proof 6.24M Jameson Irish Whiskey 5.72M Ketel One Vodka 5.07M Baileys Irish Cream 4.15M Kahlua 3.60M Tanqueray 3.46M Name: TotalSalesDollars, dtype: object
In [18]:
# Visualizing top 10 vendors and brands
# Create subplots
fig = make_subplots(
rows=1, cols=2, )
# Add VendorName bar chart
fig.add_trace(
go.Bar(x=top_vendors.index, y=top_vendors.values, name="VendorName"),
row=1, col=1
)
# Add VendorName bar chart
fig.add_trace(
go.Bar(x=top_brands.index, y=top_brands.values, name="VendorName"),
row=1, col=2
)
fig.update_layout(height=600, width=1000, title_text="Top 10 Vendors and Brands by Total Sales")
fig.show()
Which vendor contribute the most to total purchase dollar¶
In [19]:
df.head()
Out[19]:
| VendorNumber | VendorName | Brand | Description | PurchasePrice | ActualPrice | Volume | TotalPurchaseQuantity | TotalPurchaseDollars | TotalSalesQuantity | TotalSalesDollars | TotalSalesPrice | TotalExciseTax | FreightCost | GrossProfit | ProfitMargin | StockTurnover | SalesPurchaseRatio | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | IRA GOLDMAN AND WILLIAMS, LLP | 90085 | Ch Lilian 09 Ladouys St Este | 23.86 | 36.99 | 750.0 | 8 | 190.88 | 18.0 | 665.82 | 295.92 | 2.00 | 27.08 | 474.94 | 71.331591 | 2.250000 | 3.488160 |
| 1 | 60 | ADAMBA IMPORTS INTL INC | 771 | Bak's Krupnik Honey Liqueur | 11.44 | 14.99 | 750.0 | 39 | 446.16 | 47.0 | 704.53 | 494.67 | 37.01 | 367.52 | 258.37 | 36.672675 | 1.205128 | 1.579097 |
| 2 | 105 | ALTAMAR BRANDS LLC | 2529 | Right Gin | 23.25 | 29.99 | 750.0 | 12 | 279.00 | 12.0 | 359.88 | 59.98 | 9.44 | 62.39 | 80.88 | 22.474158 | 1.000000 | 1.289892 |
| 3 | 105 | ALTAMAR BRANDS LLC | 8412 | Tequila Ocho Plata Fresno | 35.71 | 49.99 | 750.0 | 320 | 11427.20 | 307.0 | 15346.93 | 12947.41 | 242.15 | 62.39 | 3919.73 | 25.540808 | 0.959375 | 1.343018 |
| 4 | 200 | AMERICAN SPIRITS EXCHANGE | 20789 | Zin-phomaniac Znfdl | 9.73 | 14.99 | 750.0 | 96 | 934.08 | 84.0 | 1511.16 | 287.84 | 9.43 | 6.19 | 577.08 | 38.187882 | 0.875000 | 1.617806 |
In [20]:
vendor_performance= df.groupby('VendorName').agg({
'TotalSalesDollars': 'sum',
'GrossProfit': 'sum',
'TotalPurchaseDollars': 'sum'
}).reset_index()
In [21]:
vendor_performance['PurchaseContribution(%)'] = (vendor_performance['TotalPurchaseDollars'] / vendor_performance['TotalPurchaseDollars'].sum()) * 100
vendor_performance
Out[21]:
| VendorName | TotalSalesDollars | GrossProfit | TotalPurchaseDollars | PurchaseContribution(%) | |
|---|---|---|---|---|---|
| 0 | ADAMBA IMPORTS INTL INC | 704.53 | 258.37 | 446.16 | 0.000145 |
| 1 | ALISA CARR BEVERAGES | 104470.94 | 78772.82 | 25698.12 | 0.008362 |
| 2 | ALTAMAR BRANDS LLC | 15706.81 | 4000.61 | 11706.20 | 0.003809 |
| 3 | AMERICAN SPIRITS EXCHANGE | 1511.16 | 577.08 | 934.08 | 0.000304 |
| 4 | AMERICAN VINTAGE BEVERAGE | 139603.53 | 35167.85 | 104435.68 | 0.033981 |
| ... | ... | ... | ... | ... | ... |
| 114 | WEIN BAUER INC | 56217.13 | 13522.49 | 42694.64 | 0.013892 |
| 115 | WESTERN SPIRITS BEVERAGE CO | 405254.83 | 106837.97 | 298416.86 | 0.097097 |
| 116 | WILLIAM GRANT & SONS INC | 7569876.20 | 1693337.94 | 5876538.26 | 1.912074 |
| 117 | WINE GROUP INC | 8304043.28 | 3100242.11 | 5203801.17 | 1.693183 |
| 118 | ZORVINO VINEYARDS | 124189.59 | 38066.88 | 86122.71 | 0.028022 |
119 rows × 5 columns
In [22]:
vendor_performance = round(vendor_performance.sort_values('PurchaseContribution(%)', ascending= False), 2)
In [23]:
top_vendors = vendor_performance.head(10)
top_vendors['TotalSalesDollars'] = top_vendors['TotalSalesDollars'].apply(lambda x : format_dollar(x))
top_vendors['GrossProfit'] = top_vendors['GrossProfit'].apply(lambda x : format_dollar(x))
top_vendors['TotalPurchaseDollars'] = top_vendors['TotalPurchaseDollars'].apply(lambda x : format_dollar(x))
top_vendors
Out[23]:
| VendorName | TotalSalesDollars | GrossProfit | TotalPurchaseDollars | PurchaseContribution(%) | |
|---|---|---|---|---|---|
| 25 | DIAGEO NORTH AMERICA INC | 67.99M | 17.89M | 50.10M | 16.30 |
| 57 | MARTIGNETTI COMPANIES | 39.33M | 13.83M | 25.50M | 8.30 |
| 68 | PERNOD RICARD USA | 32.06M | 8.21M | 23.85M | 7.76 |
| 46 | JIM BEAM BRANDS COMPANY | 31.42M | 7.93M | 23.49M | 7.64 |
| 6 | BACARDI USA INC | 24.85M | 7.42M | 17.43M | 5.67 |
| 20 | CONSTELLATION BRANDS INC | 24.22M | 8.95M | 15.27M | 4.97 |
| 11 | BROWN-FORMAN CORP | 18.25M | 5.01M | 13.24M | 4.31 |
| 30 | E & J GALLO WINERY | 18.40M | 6.33M | 12.07M | 3.93 |
| 106 | ULTRA BEVERAGE COMPANY LLP | 16.50M | 5.34M | 11.17M | 3.63 |
| 53 | M S WALKER INC | 14.71M | 4.94M | 9.76M | 3.18 |
In [24]:
top_vendors['Cumulitive Purchase(%)'] = top_vendors['PurchaseContribution(%)'].cumsum()
top_vendors
Out[24]:
| VendorName | TotalSalesDollars | GrossProfit | TotalPurchaseDollars | PurchaseContribution(%) | Cumulitive Purchase(%) | |
|---|---|---|---|---|---|---|
| 25 | DIAGEO NORTH AMERICA INC | 67.99M | 17.89M | 50.10M | 16.30 | 16.30 |
| 57 | MARTIGNETTI COMPANIES | 39.33M | 13.83M | 25.50M | 8.30 | 24.60 |
| 68 | PERNOD RICARD USA | 32.06M | 8.21M | 23.85M | 7.76 | 32.36 |
| 46 | JIM BEAM BRANDS COMPANY | 31.42M | 7.93M | 23.49M | 7.64 | 40.00 |
| 6 | BACARDI USA INC | 24.85M | 7.42M | 17.43M | 5.67 | 45.67 |
| 20 | CONSTELLATION BRANDS INC | 24.22M | 8.95M | 15.27M | 4.97 | 50.64 |
| 11 | BROWN-FORMAN CORP | 18.25M | 5.01M | 13.24M | 4.31 | 54.95 |
| 30 | E & J GALLO WINERY | 18.40M | 6.33M | 12.07M | 3.93 | 58.88 |
| 106 | ULTRA BEVERAGE COMPANY LLP | 16.50M | 5.34M | 11.17M | 3.63 | 62.51 |
| 53 | M S WALKER INC | 14.71M | 4.94M | 9.76M | 3.18 | 65.69 |
Here we can see from the Cumulitive Purchase column that top 10 vendora are contributiing more than 65 % in TotalPurchaseDollars¶
In [25]:
figure_title= "Top 10 Vendors by Purchase Contribution"
fig = px.bar(
top_vendors,
x='VendorName',
y='PurchaseContribution(%)',
text='PurchaseContribution(%)',
title= figure_title,
color='PurchaseContribution(%)',
color_continuous_scale= 'RdBu_r'
)
fig.update_layout(height=600, width=800, coloraxis_showscale=False)
fig.show()
How much total procurement is dependent on the top vendors¶
In [26]:
print(f'Total purchase contribution of Top 10 veendors is {round(top_vendors['PurchaseContribution(%)'].sum(),2)} %')
Total purchase contribution of Top 10 veendors is 65.69 %
In [27]:
vendors= list(top_vendors['VendorName'].values)
purchase_contribution= list(top_vendors['PurchaseContribution(%)'].values)
total_contribution= sum(purchase_contribution)
remaining_contribution= 100 - total_contribution
# Append other vendors category
vendors.append('Other Vendors')
purchase_contribution.append(remaining_contribution)
fig = px.pie(
names=vendors,
values=purchase_contribution,
title='Purchase Contribution Distribution',
color_discrete_sequence=px.colors.sequential.RdBu
)
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(height=600, width=800)
fig.show()
Does purchasing in bulk reduce the unit price and waht is the optimal purchase volume for cost saving.¶
In [28]:
df['UnitPurchasePrice'] = df['TotalPurchaseDollars'] / df['TotalPurchaseQuantity']
df['OrderSize']= pd.qcut(df['TotalPurchaseQuantity'], q=4, labels=['Small', 'Medium', 'Large', 'Very Large'])
In [29]:
df.groupby('OrderSize')['UnitPurchasePrice'].mean().reset_index()
Out[29]:
| OrderSize | UnitPurchasePrice | |
|---|---|---|
| 0 | Small | 46.677651 |
| 1 | Medium | 15.858090 |
| 2 | Large | 14.098238 |
| 3 | Very Large | 10.172097 |
Higher Order Size ------> Lower unit Purchase Price¶
In [30]:
# Ploting the effect of order size on unit purchase price
# Higher Order Size ------> Lower unit Purchase Price
fig = px.box(
df,
x="OrderSize",
y="UnitPurchasePrice",
color="OrderSize",
title="Effect of Order Size on Unit Purchase Price"
)
fig.update_layout(height=600, width=800, showlegend=False)
fig.show()
Effect of Odrder size in Unit purchase price¶
- Vendor buying in bulk (Very Large - OrderSize) gets the lowest unit price($10.172097 per unit), meaning higher mergin if they can manage inventory efficiently.
- The price difference between Small and Medium Order Size is substantial. (~ 66.04% reduction in unit cost)
- This suggests that bulk pricing strategies successfully encourage Vendors to purchase in large volumes, leading to higher average sales despite lower per unit revenue.
Which vendors have low stock turnover, indicating excess stock and slow moving products ?¶
In [31]:
df[df['StockTurnover']< 1].groupby('VendorName')[['StockTurnover']].mean().reset_index().sort_values('StockTurnover', ascending= True).head(10)
Out[31]:
| VendorName | StockTurnover | |
|---|---|---|
| 0 | ALISA CARR BEVERAGES | 0.615385 |
| 36 | HIGHLAND WINE MERCHANTS LLC | 0.708333 |
| 60 | PARK STREET IMPORTS LLC | 0.751306 |
| 19 | Circa Wines | 0.755676 |
| 26 | Dunn Wine Brokers | 0.766022 |
| 15 | CENTEUR IMPORTS LLC | 0.773953 |
| 78 | SMOKY QUARTZ DISTILLERY LLC | 0.783835 |
| 90 | TAMWORTH DISTILLING | 0.797078 |
| 91 | THE IMPORTED GRAPE LLC | 0.807569 |
| 101 | WALPOLE MTN VIEW WINERY | 0.820548 |
How much stock is locked in unsold inventory per vendor and which vendors contribute the most to it ?¶
In [32]:
df['TotalUnsoldInventoryValue'] = (df['TotalPurchaseQuantity'] - df['TotalSalesQuantity']) * df['PurchasePrice']
print(f'{format_dollar(df['TotalUnsoldInventoryValue'].sum())} is the total value of unsold inventory across all vendors.')
2.71M is the total value of unsold inventory across all vendors.
In [33]:
# Aggregate capital locked in unsold inventory per vendor
inventory_velue_per_vendor = df.groupby('VendorName')['TotalUnsoldInventoryValue'].sum().reset_index()
# Sort Vendors with highest unsold loked inventory value
inventory_velue_per_vendor = inventory_velue_per_vendor.sort_values(by='TotalUnsoldInventoryValue', ascending=False)
inventory_velue_per_vendor['TotalUnsoldInventoryValue'] = inventory_velue_per_vendor ['TotalUnsoldInventoryValue'].apply(format_dollar)
inventory_velue_per_vendor.head(10)
Out[33]:
| VendorName | TotalUnsoldInventoryValue | |
|---|---|---|
| 25 | DIAGEO NORTH AMERICA INC | 722.21K |
| 46 | JIM BEAM BRANDS COMPANY | 554.67K |
| 68 | PERNOD RICARD USA | 470.63K |
| 116 | WILLIAM GRANT & SONS INC | 401.96K |
| 30 | E & J GALLO WINERY | 228.28K |
| 79 | SAZERAC CO INC | 198.44K |
| 11 | BROWN-FORMAN CORP | 177.73K |
| 20 | CONSTELLATION BRANDS INC | 133.62K |
| 61 | MOET HENNESSY USA INC | 126.48K |
| 77 | REMY COINTREAU USA INC | 118.60K |
What is the 95% confidence intervals for profit margins of top-performing vendors ?¶
In [34]:
top_threshold = df['TotalSalesDollars'].quantile(0.75)
low_threshold = df['TotalSalesDollars'].quantile(0.25)
In [35]:
top_vendors = df[df['TotalSalesDollars'] >= top_threshold]['ProfitMargin'].dropna()
low_vendors = df[df['TotalSalesDollars'] <= low_threshold]['ProfitMargin'].dropna()
In [36]:
def confidence_interval(data, confidence=0.95):
n = len(data)
mean = np.mean(data)
sem = stats.sem(data)
margin_err = sem * stats.t.ppf((1 + confidence) / 2., n-1)
return mean, mean - margin_err, mean + margin_err
In [37]:
top_mean, top_lower, top_upper = confidence_interval(top_vendors)
low_mean, low_lower, low_upper = confidence_interval(low_vendors)
In [38]:
print(f"Top Vendors Profit Margin: Mean={top_mean:.2f}, 95% CI=({top_lower:.2f}, {top_upper:.2f})")
print(f"Low Vendors Profit Margin: Mean={low_mean:.2f}, 95% CI=({low_lower:.2f}, {low_upper:.2f})")
fig = go.Figure()
# KDE for top vendors
top_kde = stats.gaussian_kde(top_vendors)
x_top = np.linspace(top_vendors.min(), top_vendors.max(), 200)
fig.add_trace(go.Scatter(
x=x_top, y=top_kde(x_top),
fill='tozeroy', name='Top Vendors', line=dict(color='blue')
))
# KDE for low vendors
low_kde = stats.gaussian_kde(low_vendors)
x_low = np.linspace(low_vendors.min(), low_vendors.max(), 200)
fig.add_trace(go.Scatter(
x=x_low, y=low_kde(x_low),
fill='tozeroy', name='Low Vendors', line=dict(color='orange')
))
# Mean lines
fig.add_vline(x=top_mean, line_dash="dash", line_color="blue", annotation_text="Top Mean", annotation_position="top left")
fig.add_vline(x=low_mean, line_dash="dash", line_color="orange", annotation_text="Low Mean", annotation_position="top right")
# 95% CI shaded regions
fig.add_shape(type="rect", x0=top_lower, x1=top_upper, y0=0, y1=max(top_kde(x_top).max(), low_kde(x_low).max()),
fillcolor="blue", opacity=0.15, line_width=0, layer="below")
fig.add_shape(type="rect", x0=low_lower, x1=low_upper, y0=0, y1=max(top_kde(x_top).max(), low_kde(x_low).max()),
fillcolor="orange", opacity=0.15, line_width=0, layer="below")
fig.update_layout(
title="Profit Margin Distribution: Top vs Low Vendors",
xaxis_title="Profit Margin",
yaxis_title="Density",
width=900, height=500
)
fig.show()
Top Vendors Profit Margin: Mean=31.18, 95% CI=(30.74, 31.61) Low Vendors Profit Margin: Mean=41.57, 95% CI=(40.50, 42.64)
- The coonfidance interval for low performing vendors (40.48% - 42.62%) is significantly higher than that of top-performing vnedors (30.74% - 31.61).
- This suggestis that vendors with lower sales tend to maintain higher profir margins, potentially due to premium-pricing or lower operational-cost.
- For High-Performing Vendors: If they aim to improve profitibily, they could explore selective pricing adjustment, cost optimization or bundling strategies.
- For Low-Performing Vendors: Desoite their high profit margins, their low sales volume indicate a need for marketing, competitive pricing or improved distribution strategies.
Hypothesis testing¶
is there a significant difference in profit margins between top performing and low performing vendors?¶
- $H_0$ (Null Hypothesis): There is no significant difference in the mean profit margins and top-performing & low0performing vendors.
- $H_1$ (Alternative Hypothesis): The mean profit margins for top-performing & low-performing vendors are significantly different.
In [39]:
top_threshold = df['TotalSalesDollars'].quantile(0.75)
low_threshold = df['TotalSalesDollars'].quantile(0.25)
top_vendors = df[df['TotalSalesDollars'] >= top_threshold]['ProfitMargin'].dropna()
low_vendors = df[df['TotalSalesDollars'] <= low_threshold]['ProfitMargin'].dropna()
# Perform sample t-test
t_stat, p_value = ttest_ind(top_vendors, low_vendors, equal_var=False)
# Print results
print(f"T-statistic: {t_stat:.4f}, P-value: {p_value:.4f}")
if p_value < 0.05:
print("Reject the null hypothesis: There is a Significant difference in profit margins.")
else:
print("Fail to reject the null hypothesis: There is No significant difference in profit margins.")
T-statistic: -17.6695, P-value: 0.0000 Reject the null hypothesis: There is a Significant difference in profit margins.
In [ ]: